NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Enhancing Subsea ROV Operations: A Survey of Operator Challenges, Essential Skills, and Novel Teleoperation Technologies

Xia, P; McSweeney, K; Crippen, K; Du, J (November 2025, Marine technology society journal)

The utilization of remote operated vehicles (ROVs) has become essential across various subsea industries, such as oil and gas exploration and offshore wind energy, yet significant challenges remain in achieving effective human-ROV interaction. Despite advancements, ROV operations are hindered by complex control systems, high physical and cognitive demands on pilots, and a lack of sensory feedback mechanisms that fully convey the underwater environment’s dynamics. This study addresses these gaps by surveying ROV pilots and industry stakeholders to identify prevalent operational challenges, essential skills, and perspectives on integrating novel teleoperation technologies, including mixed reality and haptic feedback. Findings reveal a strong industry interest in technologies that enhance situational awareness and ease control demands, although concerns remain regarding practical integration and operator fatigue. By highlighting the critical skills required and potential benefits of human-centered augmentation systems, this study provides insights to inform future ergonomic designs, training frameworks, and technology development aimed at advancing safe and effective ROV teleoperation.
more » « less
Free, publicly-accessible full text available November 1, 2026
MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models

Xia, P; Zhu, K; Li, H; Wang, T; Shi, W; Wang, S; Zhang, L; Zou, J; Yao, H (April 2025, ICLR)

Artificial Intelligence (AI) has demonstrated significant potential in healthcare, particularly in disease diagnosis and treatment planning. Recent progress in Medical Large Vision-Language Models (Med-LVLMs) has opened up new possibilities for interactive diagnostic tools. However, these models often suffer from factual hallucination, which can lead to incorrect diagnoses. Fine-tuning and retrieval-augmented generation (RAG) have emerged as methods to address these issues. However, the amount of high-quality data and distribution shifts between training data and deployment data limit the application of fine-tuning methods. Although RAG is lightweight and effective, existing RAG-based approaches are not sufficiently general to different medical domains and can potentially cause misalignment issues, both between modalities and between the model and the ground truth. In this paper, we propose a versatile multimodal RAG system, MMed-RAG, designed to enhance the factuality of Med-LVLMs. Our approach introduces a domain-aware retrieval mechanism, an adaptive retrieved contexts selection, and a provable RAG-based preference fine-tuning strategy. These innovations make the RAG process sufficiently general and reliable, significantly improving alignment when introducing retrieved contexts. Experimental results across five medical datasets (involving radiology, ophthalmology, pathology) on medical VQA and report generation demonstrate that MMed-RAG can achieve an average improvement of 43.8% in factual accuracy in the factual accuracy of Med-LVLMs.
more » « less
Free, publicly-accessible full text available April 24, 2026
RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models

Xia, P; Zhu, K; Li, H; Zhu, H; Li, Y; Li, G; Zhang, L; Yao, H (November 2024, EMNLP)

The recent emergence of Medical Large Vision Language Models (Med-LVLMs) has enhanced medical diagnosis. However, current Med-LVLMs frequently encounter factual issues, often generating responses that do not align with established medical facts. Retrieval-Augmented Generation (RAG), which utilizes external knowledge, can improve the factual accuracy of these models but introduces two major challenges. First, limited retrieved contexts might not cover all necessary information, while excessive retrieval can introduce irrelevant and inaccurate references, interfering with the model’s generation. Second, in cases where the model originally responds correctly, applying RAG can lead to an over-reliance on retrieved contexts, resulting in incorrect answers. To address these issues, we propose RULE, which consists of two components. First, we introduce a provably effective strategy for controlling factuality risk through the calibrated selection of the number of retrieved contexts. Second, based on samples where over-reliance on retrieved contexts led to errors, we curate a preference dataset to fine-tune the model, balancing its dependence on inherent knowledge and retrieved contexts for generation. We demonstrate the effectiveness of RAFE on three medical VQA datasets, achieving an average improvement of 20.8% in factual accuracy.
more » « less
Full Text Available

Search for: All records